Back

Cancer Epidemiology, Biomarkers & Prevention

American Association for Cancer Research (AACR)

Preprints posted in the last 90 days, ranked by how well they match Cancer Epidemiology, Biomarkers & Prevention's content profile, based on 17 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Impact of surveillance colonoscopy on colorectal cancer incidence and mortality in Lynch syndrome - a national observational cohort study of patients in the English NHS 2010-2022

Huntley, C.; Loong, L.; Mallinson, C.; Rahman, T.; Torr, B.; Allen, S.; Allen, I.; Hassan, H.; Fru, Y. W. J.; Tataru, D.; Paley, L.; Vernon, S.; Houlston, R.; Muller, D.; Lalloo, F.; Shaw, A.; Burn, J.; Morris, E.; Tischkowitz, M.; Antoniou, A. C.; Pharoah, P. D. P.; Monahan, K.; Hardy, S.; Turnbull, C.

2026-04-22 oncology 10.64898/2026.04.16.26351020 medRxiv
Top 0.1%
12.2%
Show abstract

BackgroundLynch syndrome (LS) is a cancer susceptibility syndrome caused by germline pathogenic variants in DNA mismatch repair (MMR) genes. Due to increased risk of colorectal cancer (CRC), enhanced colonoscopic surveillance is recommended for heterozygote MMR-carriers. ObjectiveUsing a registry of English LS patients linked to digital National Health Service records, we aimed to assess adherence of MMR-carriers to national surveillance guidelines, and to determine the impact of surveillance on CRC incidence and mortality. DesignWe described the frequency of colonoscopies in 4,732 MMR-carriers and used logistic regression to determine predictors of surveillance adherence. For MMR-carriers with a record of surveillance and those without, we: estimated age-specific annual CRC incidence rates (AS-AIRs) and cumulative lifetime risks, assessed for stage-shift by comparing CRC stage distributions and stage-specific AS-AIRs, and estimated risks of death from CRC and any cause using Kaplan-Meier methods and Cox Proportional Hazards regression. ResultsSurveillance at a mean interval of [≤] 3 years (n=3028) was associated with a decrease in CRC-specific and all-cause mortality, without an associated change in total CRC incidence, even after multivariate adjustment. No strong evidence of stage-shift was observed. Colonoscopic surveillance at a mean interval of [≤] 2 years (n=1569) was associated with an increase in total CRC incidence. Incidence of early-stage cancers was also higher, with no corresponding decrease in late-stage cancers, which may reflect the short follow-up period or the impact of overdiagnosis. ConclusionThe observed reduction in all-cause mortality amongst regularly-surveilled MMR-carriers may indicate an impact of surveillance on CRC-specific mortality, though in the context of a non-randomised study likely reflects the influence of selection bias. KEY MESSAGES OF ARTICLEO_ST_ABSWhat is already known on this topicC_ST_ABSRegular surveillance colonoscopy is recommended in Lynch syndrome, though evidence to support this remains mixed. We searched PubMed for articles published from inception to 01/05/2024 using the terms "Lynch syndrome", "HNPCC", "colonoscopy", "sigmoidoscopy", "surveillance", and "screening". We found one controlled trial and several small analytical studies dating from the early 2000s which compared surveilled and non-surveilled populations and found surveillance to be associated with reduced colorectal cancer (CRC) incidence and improved survival. More recent longitudinal observational studies, most without comparator groups, found a high incidence of CRC in LS populations despite being resident in countries where surveillance was recommended. A small number of studies directly assessed time since last colonoscopy against CRC incidence and stage with mixed findings. Finally, cross-sectional comparisons between countries of CRC incidence rates and surveillance interval recommendations found no relationship between the two1,2. What this study addsHere, we conduct an observational cohort study on a large national cohort of MMR germline pathogenic variant (GPV) carriers (MMR-carriers) in England (n=4,732), comparing CRC incidence and mortality in individuals with a record of regular surveillance to those without. Through linkage of the English National Lynch Syndrome Registry to Hospital Episodes Statistics data, we are uniquely able to study a comprehensive national population of MMR-carriers and identify the dates on which colonoscopies were undertaken over time, allowing assessment of adherence to national surveillance guidelines and the impact this has on CRC outcomes. Notably, receipt of regular colonoscopy was strongly associated with deprivation as well as ethnicity. The results show that regular surveillance at an average interval of 3 years (or less) is not associated with a reduction in CRC incidence when compared to less frequent surveillance, but an apparent decrease in both CRC-specific and overall mortality is observed, even after adjustment for confounding variables. Conversely, regular surveillance at an average interval of 2 years (or less) is associated with an increase in CRC incidence when compared to less frequent surveillance, which may suggest increased diagnosis of early-stage cancers or, due to the absence of a reduction in late-stage cancers, overdiagnosis. The observed impact of surveillance on overall mortality may demonstrate the impact of surveillance on CRC-specific mortality, or, in the context of an observational (non-randomised) study, indicate that the results are subject to selection bias. How this study might affect research, practice, or policyEvidence for the benefit of surveillance colonoscopy remains mixed. Whilst polypectomy would be anticipated to prevent CRC development (thus reducing CRC incidence), several studies have observed increased frequency of CRCs in MMR-carriers undergoing frequent surveillance colonoscopy, which may reflect overdiagnosis. The selection bias inherent to observational studies of surveillance renders mortality outcomes challenging to interpret. Randomised controlled trials of colonoscopic surveillance in MMR-carriers are required for effectiveness of this intervention to be accurately assessed. Given ethical and feasibility challenges, randomised controlled trials might be complemented by quasi-experimental designs using advanced observational methods for assessing effectiveness.

2
The Impact of Multi-Cancer Early Detection Tests on Cancer Mortality: A 10-Year Microsimulation Model

Xiao, J.; ElHabr, A. K.; Tyson, C.; Cao, X.; Fendrick, A. M.; Ozbay, A. B.; Limburg, P.; Beer, T. M.; Deshmukh, A. A.; Chhatwal, J.

2026-05-06 oncology 10.64898/2026.05.05.26351205 medRxiv
Top 0.1%
12.0%
Show abstract

PurposeEarly detection of cancer can improve survival following diagnosis. However, routine screening is limited to a few cancer types. Multi-cancer early detection (MCED) tests could substantially expand cancer screening by simultaneously detecting multiple cancer types. This modeling study evaluates the potential impact of an MCED test on cancer outcomes in the US general population. MethodsWe developed a microsimulation model of 14 solid tumor cancer types which account for nearly 80% of cancer incidence and mortality. The model was calibrated to reproduce annual incidence rates reported in the Surveillance, Epidemiology, and End Results database. Cancer diagnosis could arise from standard-of-care (SoC) procedures or annual MCED testing. MCED sensitivities were derived from a case-control clinical validation study. We simulated the 10-year life course of 5 million US adults aged 50-84 years. The primary outcome was cancer mortality reduction due to MCED testing. ResultsIn the best case with perfect uptake and adherence, MCED testing added to the SoC led to a 23% decrease in 10-year cancer mortality relative to the SoC alone, translating to 668,600 cancer deaths averted over 10 years. The largest mortality reductions, in absolute terms, were observed for lung (160; 802 versus 962 per 100,000), colorectal (118; 168 versus 284), and pancreatic (50; 238 versus 288) cancer. The largest relative reductions were in cervical (52%), colorectal (41%), and breast (34%) cancer. The population-level life-year gain was 7,158 years per 100,000. ConclusionMCED testing has the potential to substantially reduce cancer-related deaths, improve outcomes across multiple cancer types.

3
Psychosocial mediators for the impact of personal genomic risk information on melanoma prevention and early detection behaviors

Wang, S. E.; Espinoza, D.; Lo, S.; Smit, A. K.; Cust, A. E.

2026-05-10 epidemiology 10.64898/2026.05.07.26352695 medRxiv
Top 0.1%
10.6%
Show abstract

BackgroundIn the Melanoma Genomics Managing Your Risk Study, access to personal genomic risk testing led to improvements in some melanoma prevention and early detection behaviors. PurposeWe aimed to examine the hypothesized psychosocial mediators of the effects observed in the trial. MethodsAustralians of European ancestry without melanoma and aged 18-69 years were recruited via the national Medicare database and randomized to receive personal genomic risk information or usual care (N=1,025). Questionnaires were administered at baseline, 1-month post-intervention, and 12-months post-baseline to assess self-reported prevention and early detection behaviors and psychosocial measures. To identify potential mediators, we first evaluated the interventions effect on psychosocial measures and the associations between psychosocial measures and behavioral outcomes. We then estimated the natural indirect effects (NIEs) and their 95% confidence intervals (CIs) to quantify the effects mediated by potential mediators identified. ResultsAmong participants with high traditional melanoma risk, the interventions effect on increased sun protection at 1-month was partially mediated by changes in perceived importance [NIE mean difference (95% CI): 0.02 (0.00, 0.04)] and perceived effectiveness [0.01 (0.00, 0.03)] of sun protection strategies. Among women, the interventions effect on increased whole-body skin examinations at 1-month was partially mediated by perceived capability to engage in skin examinations [NIE odds ratio (95% CI): 1.08 (1.00, 1.29)] and perceived control over detecting a future melanoma [1.13 (1.03, 1.32)]. ConclusionsThe effectiveness of precision prevention and early detection interventions may be enhanced by targeting key psychosocial mediators through tailored communication of personal melanoma risk.

4
Development and validation of a digital pathology artificial intelligence (DPAI)-based biomarker predicting risk of Gleason grade group reclassification for patients who are candidates for active surveillance

Mabey, B.; Lenz, L. H.; Schiewer, M. J.; Rayford, W.; Muhammad, H.; Huang, W.; Finch, R.; Nakamoto, C.; Kouros-Mehr, H.; Jasper, J.; Basu, H.; Feng, C.; Sharma, A.; Wilding, G.; Roy, R.; Muzzey, D.; Gutin, A.

2026-05-20 oncology 10.64898/2026.05.15.26353328 medRxiv
Top 0.1%
10.1%
Show abstract

Aims Active surveillance (AS) allows selected men with localized prostate cancer to defer curative therapy and reduce treatment morbidity. Conversion from AS to treatment is commonly triggered by Gleason grade group (GGG) upgrading on confirmatory biopsy. We developed and validated a digital pathology artificial intelligence (DPAI) biomarker to predict GGG upgrading in AS-eligible patients. Materials & Methods The DPAI model was trained using histopathology image features from diagnostic biopsies of 998 patients and validated in an independent cohort of 296 patients meeting criteria for AS. Logistic regression estimated the probability of confirmatory-biopsy GGG increase, and feature selection identified the most predictive variables. Results AI-GUR (Artificial Intelligence-Gleason Upgrade Risk) predicted GGG reclassification at confirmatory biopsy (OR 1.60; p=0.0003), and provided information beyond conventional stratification (risk group, CAPRA) and cribriform morphology (all p<0.01). Predicted risks were similar across time from diagnosis (~10-15% to ~85% at 1, 1.5, or 2 years; p for time=0.50), consistent with initial biopsy mischaracterization rather than time-dependent progression. Conclusions AI-GUR provides individualized estimates of confirmatory-biopsy GGG upgrading for AS candidates. Using DPAI may improve shared decision-making by complementing standard clinicopathologic tools and molecular testing using the same biopsy specimen, while informing the likelihood of grade upgrade at confirmation.

5
Comparative fine-mapping of breast cancer susceptibility loci using summary statistics methods and multinomial regression

O'Mahony, D. G.; Beasley, J.; Zanti, M.; Dennis, J.; Dutta, D.; Kraft, P.; Kristensen, V.; Chenevix-Trench, G.; Easton, D. F.; Michailidou, K.

2026-04-22 epidemiology 10.64898/2026.04.21.26351364 medRxiv
Top 0.1%
10.0%
Show abstract

Summary statistics fine-mapping methods offer advantages over classical methods, including avoiding data-sharing constraints and improved modelling of correlated variables and sparse effects. However, its performance has not been comprehensively evaluated in breast cancer using real-world data. Previous multinomial stepwise regression (MNR) fine-mapping analyses for breast cancer identified 196 credible sets. Here, we apply summary statistics fine-mapping, compare methods, and assess parameters influencing performance. Using summary statistics from the Breast Cancer Association Consortium, we compared finiMOM, SuSiE, and FINEMAP to published MNR results across 129 regions. Performance was assessed by recall using in-sample and out-of-sample LD. Discordant credible sets were examined for technical factors, and target genes were defined using the INQUISIT pipeline. SuSiE showed the closest agreement with MNR. Results varied across regions depending on the assumed number of causal variants (L), with higher values reducing recall and no single L maximising performance. At optimal L per region, SuSiE identified 8,192 CCVs in 244 credible sets, with recall of 88%, 86%, and 72% for overall, ER-positive, and ER-negative breast cancer. Thirty MNR sets were missed. Discordance was partially explained by allele flips, imputation quality, and array heterogeneity. Fifty-two MNR-identified genes, including BRCA2, WNT7B and CREBBP were not recovered, while additional candidate genes were identified. Using out-of-sample LD reduced recall by 3% but identified novel variants. Fine-mapping results vary across methods, and no single approach is sufficient. The choice of L strongly influences results, and combining analytical approaches with functional validation can improve causal variant identification.

6
Metabolites from blood and formalin-fixed, paraffin-embedded tissue from participants with low- and high-grade prostate cancer: a pilot study

Graff, R. E.; Bengtsson, H. L.; Suh, J. H.; Olshen, A. B.; Wang, E. Y.; Allen, R. M.; Van Blarigan, E. L.; Kenfield, S. A.; Cowan, J. E.; Carroll, P. R.; Simko, J.; Chan, J. M.

2026-03-19 epidemiology 10.64898/2026.03.12.26348192 medRxiv
Top 0.1%
8.7%
Show abstract

BackgroundIdentifying metabolites associated with prostate cancer (PC) aggressiveness may elucidate mechanisms underlying disease severity. Doing so for plasma and formalin-fixed, paraffin-embedded (FFPE) tissue could accelerate discovery. In this cross-sectional pilot study, we generated hypotheses for further exploration by assessing associations between plasma metabolites and Gleason score in individuals with PC and evaluating correlations between plasma and FFPE metabolite levels. MethodsWe examined plasma and FFPE samples from 10 individuals with Gleason score 7 (six 3+4, four 4+3) and nine individuals with Gleason score 9 (six 4+5, three 5+4) tumors from a convenience sample of 19 men with PC. We measured the relative abundance of polar metabolites at the time of radical prostatectomy. We used linear models of log2 fold changes to examine plasma metabolite levels relative to pathologic tumor grade. Relationships among metabolite levels measured in plasma and FFPE tumor tissue within individuals across metabolites were examined using Pearson correlations. ResultsAmong 18 plasma metabolites selected a priori because of prior associations with PC aggressiveness, serine (p=0.0051) and ornithine (p=0.036) levels were higher in individuals with Gleason 9 than Gleason 7 PC. After multiple testing correction, however, no associations were statistically significant. The median correlation between levels in plasma and FFPE tumor tissue was 0.45 (range: 0.40-0.53) for the 94 metabolites measured in both biospecimens. ConclusionsPlasma serine and ornithine demonstrated the largest differences between individuals with Gleason 7 and Gleason 9 PC. Metabolite levels in FFPE prostate tissue samples were moderately correlated with plasma levels. Future studies in larger samples are needed to further explore the hypotheses generated by this study.

7
Allostatic Load in Endometrial Cancer Disparities

Bey, G. S.; Bowen, M. B.; Wu, S.; Boykin, M.; Bernard, L.; Zhang, Q.; Melendez, B.; Celestino, J.; Batsis, J. A.; Sun, C.; Lin, F.-C.; Yates, M. S.

2026-06-11 oncology 10.64898/2026.06.06.26355062 medRxiv
Top 0.1%
8.6%
Show abstract

Background: Endometrial cancer incidence and mortality are increasing, particularly among Black women and for aggressive subtypes. Allostatic load (AL), a composite measure of physiologic dysregulation across metabolic, cardiovascular, and immune systems, varies by racial category and tumor subtype in other cancers. Endometrial cancer is strongly associated with obesity, and it is unknown whether AL scores maintain sufficient heterogeneity to evaluate differences across subgroups or with clinical outcomes. Objective: To describe the performance of AL scoring in endometrial cancer patients and examine associations with tumor characteristics (grade/histology) and survival outcomes. Methods: We evaluated AL among 398 participants newly diagnosed with endometrial cancer. AL score was calculated by assigning 1 point for each ''high-risk'' value (by clinical reference range or distribution-based) for 15 biologic variables for vital signs, anthropometrics, blood-based biomarkers, and medical comorbidities. Results: Distribution-based thresholds for variables were used to preserve heterogeneity in this obesity-dominant context. Overall, 68.7% of Black women had high AL compared to White (56.7%), Hispanic (56.7%), and other race (32.3%) women. Decision tree analyses revealed grade-dependent associations between AL and survival. For women with low-grade tumors, higher AL was associated with poorer overall survival. For high-grade tumors, intermediate AL ([&ge;]4, <8) were associated with shortest overall survival. Black women with low-grade disease experienced shorter progression-free survival regardless of AL. Conclusions: AL scoring maintains heterogeneity despite high obesity prevalence in endometrial cancer. Varying relationships between AL and survival by tumor grade and ethnoracial group suggest cumulative physiologic burden and social/structural factors may jointly shape endometrial cancer disparities.

8
Neighborhood Deprivation and Racial Disparities in Metastatic Prostate Cancer at Diagnosis: A Population-Based Study in Ohio

Payne, J. Y.; Rhodes, S.; Shoag, J.; Rothberg, M.; Le, P.; Cullen, J.; Hartman, H.

2026-06-03 epidemiology 10.64898/2026.06.02.26354723 medRxiv
Top 0.1%
8.4%
Show abstract

Background: Prostate cancer survival varies by stage at diagnosis, and Black men experience a disproportionate burden of advanced disease. We examined whether neighborhood deprivation, measured by Area Deprivation Index (ADI), contributes to racial differences in metastatic presentation. Methods: We conducted a population-based study of men diagnosed with prostate cancer in the Ohio Cancer Incidence Surveillance System from 1996 to 2016. The primary endpoint was distant-stage disease at diagnosis. Generalized additive models assessed nonlinear associations of ADI and diagnosis year with metastatic risk. Inverse probability of treatment weighting (IPTW) models estimated odds ratios comparing Black with White men after sequential adjustment for diagnosis year, age, insurance, and ADI. Results: Among 135,095 men, 18,690 were Black and 116,405 were White. Distant-stage disease occurred in 7.0% of Black men and 5.0% of White men. Black men had higher median ADI (60.9 vs. 47.3). Medicaid-insured men had the highest unadjusted odds of metastatic presentation (OR, 4.68; 95% CI, 4.13-5.31), exceeding uninsured men (OR, 2.91; 95% CI, 2.54-3.34). In IPTW models without age adjustment, the odds ratio decreased from 1.54 to 1.24 after adding insurance and ADI. In age-adjusted IPTW models, the odds ratio decreased from 1.79 to 1.41 after adding insurance and ADI. Generalized additive models showed increasing metastatic risk at higher ADI values and after 2008. Conclusions: Neighborhood deprivation and insurance-related access explained part, but not all, of the excess odds of metastatic diagnosis among Black men. Impact: Integrating ADI into cancer surveillance may improve identification of populations at risk for late-stage diagnosis.

9
Connecting Baseline Immune Exhaustion in Hot Tumors to Oral Cancer Recurrence and Nodal Metastasis

Shaikh, S.; Basu, S.; Hajihosseini, M.; Nandy, S. K.; Moorthy, M.; Arun, I.; Lali, B. S.; Arun, P.; Mukherjee, G.; Pyne, S.

2026-05-30 oncology 10.64898/2026.05.27.26354295 medRxiv
Top 0.1%
8.4%
Show abstract

Background: The use of immune checkpoint inhibitors (ICIs) in the treatment of cancer has rapidly expanded over the last decade. However, there are several knowledge gaps in understanding how tumor cells evade the immune system. There is paucity of data in HPV negative oral cancer, particularly of the gingivobuccal region. Understanding the mechanism of immune system evasion in this cancer is vital for improving patient outcomes. Methods: We characterized the baseline immune milieu of oral cancer using immunohistochemistry (IHC) on whole tumor sections from 124 cases. Tumors were classified as hot or cold and further stratified into high-risk and low-risk groups. High-risk patients included those with lymph node metastasis at diagnosis/recurrence or distant metastasis within 2 years of treatment completion. Patients without these features were categorized as low risk. Validation by RNA-Seq and Joint Enrichment Analysis of Oncogenic and Immunologic Pathways was carried out in a subset of 46 cases. Results: Hot high-risk tumors (by IHC) were distinguished by elevated PD-L1 expression and reduced NK-cell, PD1, and CTLA-4 expression. There was no difference in the expression levels of CD3+, CD8+, granzyme, or perforin compared to hot low-risk tumors, findings that align with the definition of hot tumors. RNA-Seq revealed a gene signature associated with exhausted T-cells in hot high-risk tumors. Gene and pathway analyses identified differential upregulation of isoform-specific TOX, TCF, CXCR, RUNX, IRF, BRD and BCL6 genes, implicating immune cell exhaustion and tumor aggressiveness. Significantly downregulated genes included PDCD1, HAVCR2, ZAP70, and STAT, indicative of a disabled immune microenvironment. These findings support that a state of immune exhaustion in HHR tumors is driven by progenitor exhausted T-cells and terminally exhausted T-cells; independent of PD1-TIM3. Conclusion: These findings suggest that combining TOX/TCF/BCL6 inhibitors with immune checkpoint inhibitors in the adjuvant setting might benefit patients with hot high-risk tumors. Given the results, testing for a targeted exhaustion-related gene panel at diagnosis is recommended for oral cancers to stratify tumors as high-risk or low-risk. Larger validation studies and clinical trials are now warranted.

10
Predicting bladder cancer molecular subtypes linked to bacillus Calmette-Guerin response from histology images using deep learning

Khoraminia, F.; Olislagers, M.; de Jong, F. C.; Akram, F.; Nakauma Gonzalez, A.; Lichtenberg, D.; Stubbs, A.; Costello, J. C.; Rijstenberg, L.; van Leenders, G. J. L. H.; Vrieling, A.; Aben, K. K. H.; Kiemeney, L. A. L. M.; Hoedemaeker, R. F.; Bangma, C. H.; Vermeulen, S.; Litjens, G.; Khalili, N.; Zuiverloon, T. C. M.

2026-05-06 oncology 10.64898/2026.05.05.26352375 medRxiv
Top 0.1%
6.9%
Show abstract

Background and objectiveHighrisk nonmuscleinvasive bladder cancer (HRNMIBC) is treated with transurethral resection and intravesical BCG instillations, yet {approx}50% recur and 20% progress to invasive disease. Although molecular subtyping, e.g., BCG-response-subtype (BRS), is associated with progression risk and may aid risk stratification, yet is costly and time-consuming. Intratumoral heterogeneity complicates accurate subtyping. To address these challenges, we developed a deep-learning model that predicts BRS from routine hematoxylin-eosin-stained images. We verified the models area-by-area predictions against tissue-level gene-expression maps. Methods and participantsHematoxylin-eosin-stained images from 231 HR-NMIBC patients with known BRS were used to develop a deep-learning model through cross-validation, then validated in 83 independent samples. The models spatial predictions were assessed using spatial transcriptomics to map gene expression to tissue locations in five HR-NMIBC tumors. Outcome measurements and statistical analysisDiscriminative ability for BRS3 vs. BRS1/2 was measured by AUC. Spatial alignment was assessed by calculating Pearson and Spearman correlation coefficients between model predictions and BRS fractions; significance was assessed through permutation analysis. Key findings and limitationsThe trained algorithm achieved AUC of 0.79 (development) and 0.71 (external) to detect BRS3 vs BRS1/2. Tile-level correlation between model output and molecular labels was significant (Pearson r = 0.33-0.44; p [&le;] 0.002). Limitations include retrospective sampling and limited spatial transcriptomic cases. Conclusions and clinical implicationsOur trained algorithm showed potential to stratify HRNMIBC patients by clinically relevant BCGresponse subtypes using routine hematoxylin-eosin-stained images and showed predicted spatial heterogeneity comparable to molecular profiling. Prospective validation is required before any clinical implementation. Patient summaryStandard pathology images contain hidden details related to tumors molecular subtype. We trained an AI model to read these routine images and identify specific bladder cancer subtypes associated with poor response to BCG therapy. This approach may help reveal molecular subtype-associated information from routine pathology images, without additional laboratory procedures.

11
Genetic prediction of long-term effects of aromatase inhibition on cancer and non-neoplastic disease risk

Ray, D.; Bate, T.; O'Mara, T. A.; Sasieni, P.; Gunter, M. J.; Martin, R. M.; Smith-Byrne, K.; Haycock, P.; Yarmolinksy, J.

2026-04-29 epidemiology 10.64898/2026.04.28.26351848 medRxiv
Top 0.1%
6.8%
Show abstract

BackgroundAnastrozole, an aromatase inhibitor, is approved for breast cancer prevention in high-risk women. The long-term effects of aromatase inhibition, including its repurposing potential to other cancers, possible adverse effects, and treatment effect heterogeneity across patient subgroups, remain unclear. MethodsWe used the rs727479 variant in CYP19A1 to mimic the effect of long-term pharmacological aromatase inhibition. To evaluate repurposing opportunities, genetic association data on five cancers (211,386 cases, 684,665 controls) were obtained from genome-wide association study consortia. Potential adverse effects were evaluated in a phenome-wide association study (PheWAS) of 449 health-related traits in 162,360 postmenopausal women in the UK Biobank. Effects were investigated across clinically relevant subgroups in the UK Biobank including those defined by body mass index (BMI). ResultsGenetically-proxied aromatase inhibition was associated with reduced risk of ER+ breast cancer (OR:0{middle dot}78, 95%CI:0{middle dot}67-0{middle dot}92) and decreased heel bone mineral density (-0{middle dot}32SD change, 95%CI:-0{middle dot}36,-0{middle dot}28). When examining the repurposing potential of anastrozole to other cancers, we found that genetically-proxied aromatase inhibition reduced endometrial cancer risk (OR:0{middle dot}34, 95%CI:0{middle dot}26-0{middle dot}44). In PheWAS, genetically-proxied aromatase inhibition was associated with 6 outcomes (PFDR<0{middle dot}05) including reduced risk of endometrial polyps (OR:0{middle dot}58, 95%CI:0{middle dot}45-0{middle dot}74) and postmenopausal bleeding (OR:0{middle dot}67, 95%CI:0{middle dot}54-0{middle dot}83), with stronger effects in women with higher BMI (PLRT=1{middle dot}26x10-3 and 0{middle dot}02, respectively). ConclusionOur genetic analyses recapitulate known effects of aromatase inhibition on breast cancer risk and highlight potential repurposing for endometrial cancer prevention. Limited evidence of adverse effects beyond bone mineral density was observed, and subgroup analyses suggested that women with higher BMI may experience greater protection against endometrial conditions.

12
Novel Genetic Risk Loci for Pancreatic Ductal Adenocarcinoma Identified in a Genome-wide Study of African Ancestry Individuals

Vergara, C.; Ni, Z.; Zhong, J.; McKean, D.; Connelly, K. E.; Antwi, S. O.; Arslan, A. A.; Bracci, P. M.; Du, M.; Gallinger, S.; Genkinger, J.; Haiman, C. A.; Hassan, M.; Hung, R. J.; Huff, C.; Kooperberg, C.; Kastrinos, F.; LeMarchand, L.; Lee, W.; Lynch, S. M.; Moore, S. C.; Oberg, A. L.; Park, M. A.; Permuth, J. B.; Risch, H. A.; Scheet, P.; Schwartz, A.; Shu, X.-O.; Stolzenberg-Solomon, R. Z.; Wolpin, B. M.; Zheng, W.; Albanes, D.; Andreotti, G.; Bamlet, W. R.; Beane-Freeman, L.; Berndt, S. I.; Brennan, P.; Buring, J. E.; Cabrera-Castro, N.; Campa, D.; Canzian, F.; Chanock, S. J.; Chen, Y.;

2026-04-22 genetic and genomic medicine 10.64898/2026.04.21.26351329 medRxiv
Top 0.1%
6.8%
Show abstract

Pancreatic cancer disproportionately affects Black individuals in the United States, but they have limited representation in genetic studies of pancreatic ductal adenocarcinoma (PDAC). To address this gap, we performed admixture mapping and genome-wide association analysis (GWAS) in genetically inferred African ancestry individuals (1,030 cases and 889 controls). Admixture mapping identified three regions with a significantly higher proportion of African ancestry in cases compared to controls (5q33.3, 10p1, 22q12.3). GWAS identified a genome-wide significant association at 5p15.33 (CLPTM1L, rs383009:T>C, T Allele Frequency=0.51, OR:1.45, P value=1.24x10-8), a locus previously associated with PDAC. Known loci at 5p15.33, 7q32.3, 8q24.21 and 7q25.1 also replicated (P value <0.01). Multi-ancestral fine-mapping identified two potential causal SNPs (rs3830069 and rs2735940) at 5p15.33. Collectively these findings identified novel PDAC risk loci and expanded our understanding of this deadly cancer in underrepresented populations, emphasizing the multifactorial nature of PDAC risk including inherited genetic and non-genetic factors. Statement of SignificanceTo understand how genetic variation contributes to PDAC risk in Black people in North American, we studied individuals of genetically-inferred African ancestry. We identified novel risk loci and differences in the contribution of known loci. This demonstrates that ancestry-informed genetic analyses improve our understanding of PDAC risk and enhances discovery.

13
Targeted BRCA1/BRCA2 Sequencing in a Bangladeshi Clinically Referred Cohort Identifies Candidate BRCA1 Loss-of-Function Variants and a Multi-Exon Deletion-Like CNV Signal

Al Sium, S. M.; Banu, T. A.; Goswami, B.; Naser, S. R.; Habib, M. A.; Akter, S.; Ara, M. H.; Al Din, S. M. S.; Nafisa, A.; Nayem, M. R.; Rabbi, M. F. A.; Sarkar, M. M. H.; Khan, M. S.

2026-05-20 oncology 10.64898/2026.05.11.26352643 medRxiv
Top 0.1%
6.7%
Show abstract

Background: Population-relevant BRCA1/BRCA2 data from Bangladesh are scarce, creating challenges for hereditary breast and ovarian cancer variant interpretation, counseling, and follow-up testing. We examined a clinically referred Bangladeshi cohort to characterize assay-derived BRCA1/BRCA2 short variants, sequencing-depth performance, and copy-number findings in a conservative pilot framework. Methods: Twenty-three de-identified blood-derived DNA samples were assessed using a targeted BRCA1/BRCA2 next-generation sequencing workflow. Downstream analysis used assay-generated short-variant, coverage, and CNV outputs, with coordinates reported on hg19/GRCh37. Short variants were evaluated from high-confidence PASS/VCC-H calls, and CNV review incorporated both target-region and amplicon-level copy-number patterns. Results: After removal of four low-VAF review observations, the primary germline-compatible dataset comprised 304 short-variant observations representing 34 unique variants. Both BRCA1 and BRCA2 contributed comparable variant burdens, while the overall profile was mainly composed of missense and synonymous changes. Six sample-specific heterozygous BRCA1 truncating candidates were observed, including five frameshift variants and one stop-gain variant. Protein-level mapping placed these events across the central-to-C-terminal portion of BRCA1. Sequencing depth was consistently high across the targeted regions, with all 4,255 amplicon-sample measurements exceeding 280x and 99.91% reaching at least 500x. Copy-number analysis highlighted one candidate BRCA1 multi-exon deletion-like event involving exons 15-20 in BCSIR-BRCA-21, with unresolved partial exon 14 involvement. Conclusions: This study provides an initial Bangladesh-focused targeted BRCA1/BRCA2 dataset and identifies candidate short-variant and CNV findings for validation. These findings should be interpreted as analytical candidates only and require confirmatory testing and expert clinical curation before any clinical application. The cohort is referral-enriched and should not be used to infer population prevalence.

14
Whole Genome HPV Liquid Biopsy for Pan-HPV-Associated Cancer Detection and Viral Physical State Classification

Fisch, A. S.; Abruzzo, A. R.; Eldfors, S.; Das, D.; Wang, Q.; Lumaj, G.; Shukla, S.; Gockley, A. A.; Wo, J. Y.; Hong, T. S.; Russo, A. L.; Richmon, J. D.; Giap, F.; Alzumaili, B. A.; Faquin, W. C.; Sadow, P. M.; Faden, D. L.

2026-04-29 oncology 10.64898/2026.04.27.26350528 medRxiv
Top 0.1%
6.5%
Show abstract

PurposeHPV-associated carcinomas (HPV+ cancers) account for 5% of all cancers. Circulating tumor HPV DNA (ctHPVDNA) assays for HPV+ cancer surveillance have limited prognostic utility at the time of cancer diagnosis. While HPV integration into the host genome is a proven tissue-based biomarker predicting poor clinical outcomes, existing clinically utilized ctHPVDNA assays cannot classify the viral physical state. MethodsWe previously developed HPV-DeepSeek, a multi-feature HPV whole-genome sequencing liquid biopsy with 99% diagnostic accuracy at the time of HPV+ oropharynx cancer diagnosis. We test the diagnostic accuracy of HPV-DeepSeek in a cohort of 235 HPV+ cancers across nine anatomic sites and employ a novel blood-based computational classifier to infer HPV genome physical state from plasma, termed HPV-SIGNAL, to assess its prognostic potential. ResultsHPV-DeepSeek demonstrated a sensitivity and specificity of 99%. In 181 eligible samples, HPV-SIGNAL identified four viral physical states: episomal-only (N = 69), episomal-rearranged (N = 48), integrated-mixed (N = 55), and integrated-clonal (N = 9), which were confirmed and further elucidated via three orthogonal tissue and blood approaches. Patients harboring integrated viral states in the blood exhibited significantly worse progression-free survival (HR 3.28, 95% CI 1.63-6.58, p = 0.00084) and overall survival (HR 2.98, 95% CI 1.16-7.64, p = 0.023) compared to patients with episomal states. ConclusionHPV whole-genome sequencing liquid biopsy has high diagnostic accuracy across HPV+ cancer types and can be used to identify and classify HPV physical state from blood. Patients with integrated viral states detected in the blood demonstrated worse progression-free and overall survival, suggesting blood-based HPV physical state classification could be used as a prognostic tool at the time of cancer diagnosis. Translational RelevanceCurrent circulating tumor HPV DNA assays for HPV-associated cancer surveillance have limited prognostic utility at the time of cancer diagnosis. While HPV integration into the host genome is a proven tissue-based biomarker predicting poor clinical outcomes, existing circulating tumor HPV DNA assays cannot classify the viral physical state. Here, we show that HPV-SIGNAL, a novel blood-based computational classifier to infer HPV genome physical state from plasma using output from HPV-DeepSeek, an HPV whole genome sequencing liquid biopsy, accurately identifies and classifies HPV physical state from blood and is prognostic of progression-free and overall survival across HPV-associated cancer types.

15
Breast cancer polygenic risk score performance varies by socioeconomic status

Domian, H. I.; Tian, X.; Ong, D.; Hamilton, L.; Shieh, Y.; Musharoff, S. A.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.03.26354819 medRxiv
Top 0.1%
6.4%
Show abstract

Background: Polygenic risk scores (PRS) for breast cancer are increasingly used for risk stratification to inform screening and prevention. However, for PRSs to be equitable and clinically useful, they need to perform well across diverse populations. While PRS performance is known to be ancestry-dependent, it is not well understood how environmental context, such as that of socioeconomic status (SES), affects PRS transferability. Here, we assess whether SES, measured via self-reported household income, modifies breast cancer PRS performance and, if so, whether socioeconomic context contributes predictive information beyond genetic risk alone. Methods: We used the US-based All of Us biobank to evaluate how SES impacts breast cancer PRS performance. First, we quantified changes in breast cancer PRS performance by modeling a commonly-cited polygenic score for breast cancer previously described by Mavaddat et al. with SES. We then reestimated the genetic effect sizes of the 3,820 variants from Mavaddat et al. in All of Us with and without income as a covariate. Because social determinants of health affect breast cancer detection and outcomes, we stratified analyses by socially defined populations on the basis of self-identified race and ethnicity. We further stratified individuals whose self-identified race is White (''White'') into three SES groups (high, middle, low) based on self-reported income and re-estimated genetic effect sizes to create SES-specific PRSs. We then applied these PRSs to White participants, the largest group in the study, and to Black or African American (''Black'') and Hispanic or Latino (''Hispanic'') participants, groups underrepresented in breast cancer research. Model discrimination between cases and controls was measured by area under the curve (AUC). Results: We analyzed 163,715 women from the All of Us biobank, which included 8,833 breast cancer cases (6,619 White, 1,178 Black, and 1,036 Hispanic), with relative income available for a subset of these cases (5,525 White, 848 Black, and 566 Hispanic). The ancestry-dependent performance of the breast cancer PRS described in Mavaddat et al. was replicated in All of Us. In Black individuals, this PRS (AUC and 95% CI: 0.576 [0.571, 0.582]) produced a similar increase in AUC as relative income (AUC: 0.573 [0.568, 0.577]) when added to an age-only model. Incorporating income with PRS, age, and genetic PCs 1-3 improved AUC by 0.007 in White Americans and 0.018 in Black Americans (both p < 10-11), while attenuating the contribution of PRS in the full model. PRS performance also varied among SES categories. Notably, PRSs with variant effect sizes that were recalibrated in low-SES White participants performed best in low-SES White participants (AUC: 0.605 [0.583, 0.628]) and Black Americans (AUC: 0.588 [0.586, 0.591]), both better than performance in high-SES White Americans (AUC: 0.579 [0.577, 0.580]) and middle-SES White Americans (AUC: 0.578 [0.569, 0.586]). Conclusion: Socioeconomic context, measured by income, significantly impacts the transferability of a PRS for breast cancer within and among groups defined by self-identified race and ethnicity. Accounting for SES improves PRS performance, most notably in Black Americans and low-SES White individuals.

16
Integrated T-Cell Receptor Repertoire and Tumor Immunogenicity Profiling Reveals Distinct Immunogenomic States in Endometrial Cancer

Aversa, I.; Abatino, A.; Isabello, A.; Gallo, R.; Isdraele, L.; Straface, T.; Zullo, F. M.; Guida, M.; Saccone, G.; Fiume, G.; Venturella, R.; Viglietto, G.; Cuda, G.; Costanzo, F.; Zullo, F.; Palmieri, C.

2026-06-10 oncology 10.64898/2026.06.08.26355191 medRxiv
Top 0.1%
6.3%
Show abstract

Background Endometrial cancer exhibits marked molecular and immune heterogeneity that is only partially explained by established genomic biomarkers. We investigated whether T cell receptor (TCR) repertoire architecture captures complementary dimensions of antitumor immunity beyond conventional molecular classification. Methods Paired tumor and peripheral blood samples from eight patients with molecularly characterized endometrial cancer underwent TCR repertoire profiling. Diversity, clonality, and tumor blood overlap metrics were integrated with genomic variables, including tumor mutational burden (TMB), genomic instability metric (GIM), and POLE status. Principal component analysis and correlation analyses were used to identify major dimensions of repertoire organization. Composite Immune Focusing and Immune Sharing Scores were derived to summarize dominant repertoire patterns. Results The first two principal components explained 70.1% of total repertoire variance and revealed substantial heterogeneity independent of histological subtype. TMB was strongly associated with reduced repertoire diversity and increased clonal dominance, resulting in a robust association with the Immune Focusing Score ({rho} = 0.88, p = 0.004). POLE mutated tumors occupied the extreme end of this focusing continuum. In contrast, genomic instability was associated with increased tumor blood repertoire overlap and preserved diversity, reflected by a strong correlation between GIM and the Immune Sharing Score ({rho} = 0.76, p = 0.027). The two immune scores showed minimal correlation with each other ({rho} = -0.24, p = 0.57), indicating that they capture largely independent aspects of immune organization. Conclusion Integrative analysis of TCR repertoire architecture and tumor genomics identifies distinct immunogenomic states in endometrial cancer that are not fully captured by conventional molecular classification. If validated in larger cohorts, immune focusing and immune sharing metrics may provide complementary biomarkers for patient stratification and immunotherapy-oriented precision oncology

17
Retrospective cohort study extracting coexisting background breast-lesion features from stage I-III invasive breast cancer

Lim, R. J. Y.; Nitar, P.; Lau, K. W.; Leong, L. C. H.; Lim, G. H.; Tan, V. K. M.; Tan, B. K. T.; Tan, E. Y.; Goh, S. S. N.; Hartman, M.; Wong, F. Y.; Li, J.; Joint Breast Cancer Registry,

2026-05-22 oncology 10.64898/2026.05.19.26353633 medRxiv
Top 0.1%
6.0%
Show abstract

Background Background breast features are frequently noted in pathology reports alongside invasive breast cancer but rarely factor into prognosis or treatment decisions. Their relationship to tumor characteristics and patient outcomes remains incompletely characterised. Methods We conducted a retrospective cohort study of 7,603 patients with Stage I-III invasive breast cancer (diagnosed 1991-2022, age <80 years) from the Joint Breast Cancer Registry in Singapore. Natural language processing (NLP) was applied to 9,754 free-text pathology reports to extract co-existing background breast features, with accuracy validated by dual-reviewer assessment of 200 reports. Unsupervised hierarchical clustering grouped extracted features into three categories. Associations with tumor characteristics were assessed by multinomial logistic regression, and ten-year overall survival by Cox proportional hazards models (median follow-up 9.6 years; 620 deaths). Results Here we show that NLP-based extraction of background breast features from routine pathology reports achieves an accuracy of over 90% across features. Lobular neoplasia and benign proliferative changes are associated with less aggressive tumor characteristics, whereas early neoplastic and papillary lesions are more prevalent in HER2-enriched and luminal B tumor subtypes. Benign proliferative changes are associated with better survival in age- and year-adjusted models (hazard ratio 0.91, 95% CI 0.86-0.97), but this association is attenuated after adjustment for stage and subtype. Conclusions NLP-enabled extraction of background breast features from pathology text is feasible at scale. These features reflect tumor biology but do not independently add prognostic information beyond established clinical variables.

18
Documented clinical genetic testing among carriers of hereditary breast and ovarian cancer variants: Ancestry and socioeconomic disparities in the All of Us research program

Yerukala Sathipati, S.; Scott, H.

2026-06-10 oncology 10.64898/2026.06.09.26355262 medRxiv
Top 0.1%
5.0%
Show abstract

Importance: Hereditary breast and ovarian cancer (HBOC) variant carriers benefit from risk-reducing interventions, but only if identified. The extent to which carriers are clinically recognized, and whether recognition is equitable across diverse populations, is poorly characterized in a single large U.S. cohort. Objective: To estimate P/LP HBOC carrier prevalence across genetic ancestry groups, quantify documented clinical genetic testing among carriers, and evaluate ancestry and socioeconomic disparities in testing. Design, Setting, and Participants: Cross-sectional analysis of the All of Us Research Program Controlled Tier (Curated Data Repository v8/C2024Q3R9), comprising participants with short-read whole genome sequencing and linked electronic health record (EHR) and survey data. Carriers were ascertained from research genomic data independent of clinical testing. Exposures: Genetically inferred ancestry (African [AFR], Admixed American [AMR], East Asian [EAS], European [EUR], Middle Eastern [MID], South Asian [SAS]); self-reported household income and educational attainment. Main Outcomes and Measures: (1) Carrier prevalence with Wilson 95% CIs; (2) documented clinical genetic testing (procedure codes) among carriers; (3) adjusted odds of documented testing among women, by ancestry, before and after socioeconomic adjustment, using multivariable logistic regression. Results: Among 414,830 participants, P/LP HBOC carrier prevalence was 1.42% (95% CI, 1.38-1.45) overall and similar across ancestry groups (AFR 1.24%, AMR 1.32%, EAS 1.19%, EUR 1.52%, MID 1.68%, SAS 1.33%; overlapping CIs). Among 250,071 women in the testing analysis, documented clinical genetic testing was rare: only 74 of 5,878 carriers overall (1.3%) and 59 of 3,572 European-ancestry carriers (1.7%) had a documented test, with counts below reportable thresholds in all other ancestry groups. African-ancestry women had lower adjusted odds of documented testing than European-ancestry women (Model 1 adjusted odds ratio [aOR], 0.32; 95% CI, 0.27-0.39), an association that attenuated but persisted after adjustment for income and education (Model 2 aOR, 0.48; 95% CI, 0.40-0.58; P < 0.001); Admixed American women also had reduced adjusted odds (aOR, 0.71; 95% CI, 0.61-0.84). Lower income and lower education were independently and dose-dependently associated with lower testing odds (income <$25,000 aOR, 0.46; high-school education aOR, 0.54). Conclusions and Relevance: High-risk HBOC variant carriers are present across all ancestry groups at similar frequencies, yet documented clinical genetic testing was disparate in the different ancestry groups. African-ancestry women experience a testing gap that is not fully explained by socioeconomic position, implicating structural barriers in access and referral. Population-level strategies that decouple carrier identification from current referral pathways may be required to close this gap.

19
Comparing Gleason Pattern 4 Measurement Approaches on Prostate Biopsy Using Machine Learning: A Proof-of-Principle Study

Buzoianu, M. M.; Yu, R.; Assel, M.; Bozkurt, A.; Aghdam, H.; Fine, S.; Vickers, A.

2026-04-24 oncology 10.64898/2026.04.23.26351615 medRxiv
Top 0.1%
4.9%
Show abstract

ObjectiveTo demonstrate the proof of principle that machine learning (ML) can be used to quantify Gleason Pattern (GP) 4 on digitized biopsy slides using multiple measurement approaches, allowing direct comparison of their prognostic performance. MethodsWe assembled a convenience sample of 726 patients with grade group 2-4 prostate cancer on systematic biopsy who underwent radical prostatectomy between 2014 and 2023. Digitized biopsy slides were analyzed using a machine-learning algorithm (PAIGE-AI) to quantify GP4 using multiple measurement approaches, particularly with respect to how gaps between cancer foci ("interfocal stroma") were handled. GP4 extent was quantified using linear measurements or a pixel-based area metric. Discrimination of each GP4 quantification approach, along with Grade Group (GG), was assessed for adverse radical prostatectomy pathology and biochemical recurrence. ResultsWe identified 15 different quantification approaches and observed differences between their discrimination. The highest discrimination was in the pixel-countingmethod (AUC 0.648). GP4 quantification outperformed GG for predicting adverse pathology (AUC 0.627 vs 0.608). Amount of GP3 was non-predictive once GP4 was known. These findings were consistent for BCR. ConclusionsWe were able to measure slides using 15 distinct measurement approaches and replicated prior findings using ML to quantify GP4. Our findings support the use of ML as a research tool to compare different GP4 quantification approaches. We intend to use our method on larger cohorts to determine with which measurement approach best predicts oncologic outcome.

20
Assessing potential harms from screening overdiagnosis and false positives with multicancer early detection tests

Malagon, T.; Russell, W. A.; Burnier, J. V.; Dickinson, K.; Brenner, D.

2026-04-13 oncology 10.64898/2026.04.09.26348927 medRxiv
Top 0.1%
4.9%
Show abstract

BackgroundMulticancer early detection tests could be used for cancer screening, but may lead to harms, including false positive results and overdiagnosis of indolent tumours that would not have become clinically evident during that persons lifetime. We assessed the potential for these screening harms in the context of future population-based screening with a multicancer early detection test. MethodsWe used a microsimulation model to assess potential population-level impacts of screening at ages 50-75 years with a multicancer early detection test in Canada. We assumed high test specificity (97-99.1%) and test sensitivity increasing with cancer stage. The model includes latent indolent cancers that would not be diagnosed within that persons lifetime but can be overdiagnosed through screen-detection. We calculated the yearly and cumulative lifetime probabilities of screening overdiagnosis and false positive test results, assuming a range of preclinical screen-detectable periods (2-5 years). ResultsAn estimated 2.1-6.0% of all yearly screen-detected cancers with a multicancer screening test were predicted to be overdiagnoses across scenarios. The proportion of overdiagnosis varied by site, and strongly increased with age, going from 1% at age 50 to over 10% of screen-detected cancers by age 75. The test positive predictive value ranged from 15.9%-77.6%, meaning that there could be 0.3-5.3 false positives with no underlying cancer for every true cancer case detected by the test. ConclusionPopulation-level multicancer screening with a multicancer early detection test would likely not lead to substantial screen-related overdiagnosis. Healthcare systems should consider how screening false positives may increase their diagnostic service caseload.